back

0. Introduction and setup
1. Simplest program to print to screen
2. Print to screen using video memory instead of BIOS interrupt
3. Simplest way to read keyboard input
4. Read multiple keyboard inputs
5. Read keyboard input and print without BIOS interrupt
6. Read multiple keys with BIOS interrupt
7. Read input and print multiple scancodes using video memory
8. Read input of single key and print using BIOS interrupt
9. Read and print multiple keys with BIOS interrupt
10. Read and print single key repeatedly
11. Read and print multiple keys
12. Read more than 512 bytes from floppy
13. Links

1. Simplest program to print to screen

img
[text]

This program prints an exclamation point to the screen (note that it prints not to the first character spot on the screen, but to the first spot on the first line after the messages written by QEMU’s built-in software as it starts up):

img

There are two main modes for graphics, which is a text mode and then a graphics mode. Text mode being what older computers would have used, where you have some built-in characters a monitor can display, and it can display say 25 rows of 80 characters and limited colors for the character and background. The other mode is a non-text graphics mode, where you can change each pixel and make it the full range of modern colors, and you can make modern windows type programs and to make letters you would have to completely build your fonts out pixel by pixel. This tutorial only covers the text-based mode.

BITS 16

"BITS 16" sets nasm to produce 16 bit code. There are different "modes" for the cpu. "Real mode" is the default and uses 16 bit code. To use 32 bit code you would need "protected mode". "Long mode" is then used for 64 bit code (and long mode has a compatibility mode to use 32 bit code). Certain code needs to be executed to move to protected mode or long mode, this tutorial only covers real/16-bit mode.

mov al, '!'

This instruction moves the ASCII value for an exclamation point to the “al” register. ASCII is a system for representing 256 of the most common letters, numbers, and symbols within an 8-digit binary number (within a byte of data). It is also the format a number must be in to be printed to the screen by the text graphical mode. For list of ASCII values, see here.

The ASCII value corresponding to '!' is 0x21, so the hexadecimal value of 21, (or binary 0010 0001) is moved by this command to the al register. “mov” is an instruction that moves the value of the second argument to the location of the first argument. The first argument can either be a register, as it is in this case, “al”, or can be a location in memory at the address held in a register (if the register had brackets around it, like “[al]”). The second argument can be a value in a register, a value at an address held in a register, or a value itself. The value can be in binary, octal, decimal, hexadecimal, or, as in this case, can be characters. Decimal is a number with no suffix (21), binary has a “b” suffix, hexadecimal has a “0x” prefix (0x21) or an “h” suffix (21h), and characters are enclosed in single or double quotes. (Note that single digit hexadecimal numbers suffixed with an “h” must be prefixed with a "0"). Full lists of numeric formats is found here and more information about characters is found here here.

mov ah, 0x0E

Move hexadecimal value of 0E to AH register.

int 0x10

This instruction prints the exclamation point to the screen. "int" is an instruction that calls a "BIOS interrupt". Among other reasons, want to avoid using interrupts because they do not work in Protected or Long Modes. I’m not sure if switching back to real mode each time you would want to use an interrupt would be feasible, but I don’t suppose it would be efficient. Printing to the screen without the interrupt will be shown in the next example, but the way to print with the interrupt is just shown here as it is the simplest way to print to the screen. The number following “int” tells which interrupt is called. Interrupt 0x10 according to Wikipedia has a number of “video services”, and then value in the AH register when the interrupt is called tells which use or function of interrupt 0x10 is to be used. In this example the AH register has the value 0x0E, which will print the character corresponding to the ASCII character represented by the value in the AL register to the screen.

times 510-($-$$) db 0

Puts enough zeros so that there are zeros from the current point until the 510th byte. For explanation of “$” see the nasm documentation here.

dw 0xAA55

BIOS looks for hexadecimal AA55 in bytes 511 and 512 as a "boot signature", to confirm that what it is finding you intend to boot from (because first it checks on a normal computer, say, your cd drive and then your hard drive, so if you have a game in the cd drive it won’t boot from that and then will find your intended operating system on your hard drive and then boot from there. There is I guess a sequence of places BIOS checks but I don’t know exactly). When BIOS starts it finds the drive with the boot signature and then loads the first 512 bytes of data from that drive, ending with the AA55, into memory, and then starts execution at byte 0 of those 512 bytes. BIOS can only load 512 bytes, so if you want your program to be longer than that, you will need to load any bytes after 512 within the first 512 bytes of your program (see section 13 for how to do this, all programs before that are less than 512 bytes).

Note: When I started, I thought using the “h” suffix would be preferable for writing hexadecimal numbers as it is one less character to type and seems easier to read, but over time I found the “0x” prefix preferable because although it is harder to type and read when looking at a single line, it is easier to visually scan a program/look over many rows of code quickly and read or find specific values when the hexadecimal numbers do not have a suffix.

2. Print to screen using video memory instead of BIOS interrupt

To print to the screen without using interrupts, need to use the video memory. A special section of memory located at address 0xb8000 is the video memory, and the computer’s monitor is set up in text mode to display text based on the data that is located in the 4000 bytes starting at 0xb8000. Text mode has 25 rows of 80 characters, where each character is determined by two bytes, the first byte being the ASCII value for the character to display and the second byte being the color of the text. The first four bits of the color byte determines the background color for that character and the second four bits determines the text color of that character.

Decimal Hex Name Color Decimal Hex Name Color
0 0 Black 8 8 Dark Gray
1 1 Blue 9 9 Light Blue
2 2 Green 10 A Light Green
3 3 Cyan 11 B Light Cyan
4 4 Red 12 C Light Red
5 5 Magenta 13 D Light Magenta
6 6 Brown 14 E Yellow
7 7 Light Gray 15 F White

In 16 bit mode you can only write directly to addresses up to 0xffff without using "segments", so we cannot access 0xb8000 directly. (Each hexadecimal character is 4 bits, so with 16 bits you can write to the number of addresses specified by 16 / 4 = 4 hexidecimal numbers, which is 16^4 = 65,536 different addresses. Stated differently, with 16 bits you can write to up to 2^16 = 65,536 addresses.) Segments enable you to address up to 2^20 = 1,048,576 = 0xfffff addresses by using two registers instead of one. One register holds a subset, or “segment”, of the 2^20 addresses, and the second register holds the address within that subset/segment. The value in the second register holding the address within the subset/segment is called the "offset". The “segment registers” CS, DS, SS, ES, FS, and GS are used for storing the segment values. The format segment:offset is used for representing an address within a segment. The segments are overlapping, with the first segment starting at 0x0000 and then each segments starting at an address 16 bytes higher than the previous segment.

The first segment just starts at 0x0000, so using that segment with any offset is the same as just using the value in the offset by itself to access that address directly. So for a sample offset, 0x1234, 0x0000:0x1234 is the same as 0x1234. The second segment, 0x0001 starts at byte 16, so if we used 0x0001:0x1234 it would be equivalent to a different value, 0x1244. With 0x1244, we could just access it directly, so we might not need to use the segment. But for 0x0001:0xfff0, it does allow us to access a memory address we couldn’t without segments, since memory access without segments ends at 0xffff and we can’t access 0x10000 directly.

Because the segments are overlapping and start 16 bytes apart, the address accessed by a segment:offset is given by the formula:

address = segment * 16 + offset

Using segments, we can access 0xb8000 with 0xb800:0x0000.

img
[text]

Result:

img

Note: Are overwriting the first spot in video memory so are overwriting the “S” of “SeaBIOS” at the top left of the screen, not the first character in the row after “Booting from Hard Disk…”.

Note that you can’t load the es register directly, like in the following:

mov es, 0xb800

Must load another register and then move to es:

mov ax, 0xb00
mov es, ax

Note also that out of ax, bx, cx, and dx, only bx can be used as an offset inside of “[ex:__]”. So, cannot use:

mov cx, 0x0000
mov [es:cx], ax

3. Simplest way to read keyboard input

This example prints an exclamation mark if the first key pressed is an 'm', otherwise does nothing.

img
[text]

Result if the first key pressed is an 'm':

img

Explanation:

int 0x16

int 0x16 waits for a key to be pressed and then stores the value of the key that was pressed into the ah register. The value of the key pressed is called a “scan code”.

cmp ah, 0x32

“cmp” compares the values of its first and second argument. It then sets certain values in the EFLAGS register depending on the result. If the values of the first and second argument are equal, the computer sets a certain bit, known as the Zero Flag (ZF) bit, to be 1, and if the values are not equal, it sets the bit to be 0. You can then use another instruction that references the value of that bit to do one action if the bit has the value of 1, and another action if that bit has a value of 0. That way you can do one action if the two values being compared by “cmp” are equal, and another action if the values are not equal.

In this example, 0x32 is the value associated with the ‘m’ key (that is, the value of the “scan code” associated with ‘m’). The previous “int 0x16” instruction puts the value of the key pressed into the ah register, and then this step compares that value to 0x32 to see if the key pressed was an ‘m’.

je .equal

The “je” instruction tells the computer to start executing certain code if the value of the ZF bit is equal to 1. “je” can be thought of as “jump if equal” because it changes the location of what part of the program is running (aka it “jumps” to a different point in the program) based on whether the bit is equal to 1, that is, if the previous comparison done by “cmp” resulted in the two values being equal to one another.

The programs in the first two steps of this tutorial just started with having the first line of the program run, or “execute”, and then continue down with each line running until the end of the program. In this case, if the value of the ZF bit is set to 1, then the line of code running will change to the location in the program where the argument of je is located. For example, if the ZF bit is 1, this program moves to where the “.equal” is in the program. In this example, that means that the line “jmp .notequal” is skipped, and then next line of code executed will be “mov al, ‘!’”. A word following a period, like “.equal”, that is used as a location in the program is called a “label”, so we would say this program “jumps” to the “.equal” label. The code in the “.equal” label will print an exclamation point, as we want to print the exclamation point if the “m” is pressed.

jmp .notequal

“jmp” stands for “jump”, and the location of the code that is running jumps to the “.notequal” label. If the key pressed is not an “m”, then the comparison by “cmp ah, 0x32” will result in a 0, for not equal, and we then do not want to print the exclamation mark, so we want to skip over the code following the “.equal” label. Jumping to .notequal lets us skip that code.

4. Read multiple keyboard inputs

This example prints an exclamation mark once an “m” is pressed, then ends. Unlike the previous example, it doesn’t have to be the first key pressed, because this example is capable of reading multiple keyboard inputs.

img
[text]

Result:

img

This time the “jmp” instruction moves execution back to the .repeat label, so the “int” instruction can repeatedly read keyboard input. The “je .done” instruction won’t cause a jump until an “m” is input. When the “m” is input, execution jumps to “.done”, the exclamation mark prints, and the program ends.

Because the code:

img

will repeatedly run, starting from the beginning, going down to the “jmp .repeat”, and then starting over, or “looping back” to the beginning at “mov ah, 0x00”, the following section of code is called a “loop”:

img

This example can then be summarized by saying: this example has a loop that “breaks” when “m” is press, then it prints an exclamation mark.

5. Read keyboard input and print without BIOS interrupt

Like last example, loop breaks when user presses an “m”, and then it prints an exclamation mark.

img
[text]

Explanation:

int al, 0x60

“in” reads input from what are called “ports”. “in” reads the input from the port indicated by its second argument to the register in its first argument. Port 0x60 holds the keyboard input, so this instruction reads the value of the keyboard input to the al register.

6. Read multiple keys with BIOS interrupt

This program reads multiple keyboard inputs without using the BIOS interrupt. For now we just are printing the value of the scan code associated with the key, in a future section we will convert the key’s scan code value to its ASCII value so that we are printing on the screen what is pressed. Also note that a key has a scancode for when it is pressed and when it is released, and this program will print both of these scan codes when a key is pressed and released.

img
[text]

Result, supposing you typed 'a', 'b', and then 'c':

img

Using int 0x16 to read the keyboard input, it just gets one keyboard input value and moves that input to ah. Using “in al, 0x60”, it gets the input of whatever is in port 0x60. If you do a loop that continuously runs “int 0x16”, it will keep getting one key at a time. If you do a loop that continuously runs “in al, 0x60”, it will just keep getting the value of port 0x60. Because keyboard presses generate one scan code when the key is pressed, and another when it is released, if you made a loop that continuously read port 0x60 and printing what was there, the program would keep printing the value of a key for as long as you pressed it down, and then when you released the key, it would print the value of the scancode associated with releasing the key, until another key is pressed.

This program puts the value of a key it is going to print in the bl register, so that after it prints that key’s value, it compares the new value read from port 0x60 to that value in bl the next time through the loop. If the value didn’t change, then there hasn’t been a new key pressed, and it won’t print. The program starts with 0xFA in the bl register, as that appears to be the starting value of port 0x60 before a key is pressed and if bl doesn’t have that value to start, the program will print the ASCII symbol for 0xFA, which is a little dot called an “interpunct”:

img

Suppose instead of saving the key to register bl and doing the comparison, you tried the following:

img
[text]

The program would begin printing "interpuncts" repeatedly, and then if the user pressed, for example, a "j", it will repeatedly print a value associated with that key:

img

7. Read input and print multiple scancodes using video memory

img
[text]

This example will take multiple keyboard inputs and each time a key is pressed and released, it will put that scancode at the position of the first text character on the screen:

img

To continue printing to the next spaces on the screen, need to use the “add” instruction.

img
[text]

Result, supposing you typed 'a', 'b', and then 'c':

img

Explanation:

add dx, 2

“add” adds the value of the second argument to the first argument. If dx was equal to 0x0000, then adding 2 to it would be the equivalent of adding 0x02 to it, so would give 0x0002. From this we can see that when the second argument is a decimal number and the first is, or will be used as, a location in memory, it has the effect of moving that many bytes in memory. So this example moves two bytes in memory. Two bytes being the size of a character of text in video memory, one byte for the character symbol and one byte for the color, moving two bytes in memory moves to the position of the next character.

With changes highlighted:

img

8. Read input of single key and print using BIOS interrupt

The following will print the letter “a” each time press the “a” key, otherwise won’t print anything. (0x41 is the ASCII code for “a”.)

img
[text]

Result:

img

9. Read and print multiple keys with BIOS interrupt

List of scan codes is here. (I don't know why the scan codes aren't the same as the ascii, they are both two character hexidecimal.)

Full program: [text]

Result, after typing some:

img

10. Read and print single key repeatedly

Like section 8, but using video memory instead of the BIOS interrupt.

img
[text]

Result:

img

11. Read and print multiple keys

Next will combine steps 9 and 11 to print all letters to monitor memory.

Full program is: [text]

Result:

img

12. Read more than 512 bytes from floppy

Want to read code from boot floppy into memory and then point instruction pointer to that code so that we can have more than 512 bytes. Will start with simplest program from number 1 which was:

img
[text]

As mentioned before, when the computer turns on, 512 bytes of code are loaded by the computer from the first location that contains 0xAA55 at the right memory address. If more than 512 bytes of code are needed, which nearly every modern program would need, that additional code must also be loaded. Each sector of the floppy disk can contain 512 bytes of code. The first sector contains the 512 bytes that are loaded by the computer at start-up, so we will need to load one additional sector.

img
[text]
[text with notes]

Result:

img

Typically, BIOS will load the 512 byte boot sector to memory location 0x7C00. 512 bytes after 0x7C00 is 0x7E00, and we can load our additional code there. Note that some areas of memory are reserved for certain functions (for example, we have seen that video memory is located at 0xB8000). See this link for a sample overview of available memory addresses. This link also contains towards the bottom of the page a listing of available addresses. From the first link, it appears we have from 0x07E00 to 0x7FFFF, at least, free for our programs. This example won't require that much, just a few bytes, but it is something to know for future programs.

You can read from a floppy disk using BIOS Interrupt 0x13 when ah is set to 0x02. It loads the data from the floppy disk to the memory location specified by [es:bx].

When using int 13h to read from a floppy, need to put the drive number in dl. BIOS automatically loads the drive number that was used to boot to register dl. A floppy drive is drive number 0, so dl has the value of 0, and we don’t need to do anything else for dl.

Need to put the sector number in cl. Apparently sector numbers start at 1 instead of 0, and sector 1 is the boot sector (first 512 bytes that we have ending in 0xAA55), so we want to start with sector 2. To do this, use:

mov cl, 2

To set number of sectors to read, use al. Supposing we are just reading one more sector:

mov al, 1

To set cylinder (at least partially, for what is needed for a floppy), use ch. Cylinders and heads start with 0 instead of 1, so want cylinder 0:

mov ch, 0

To set head (at least partially, for what is needed for a floppy), use dh. Heads start with 0 instead of 1, so want head 0:

mov dh, 0

After loading the data using "int 0x13", want to jump execution to 0x7E00:

jmp 0x07E0:0000

13. Links

Arjun Sreedharan   Kernels 101 - Let's write a kernel   Kernels 201 - Let's write a kernel with keyboard
MikeOS   How to write a simple operating system
Stack Overflow   Main Page   How to write to screen with video memory address 0xb8000 from real mode?   Load segment from floppy with int 13h
OSDev.org   Color Table   More on colors   Memory Map
Intel   Intel 64 and IA-32 Architectures Software Developer Manuals